Urdu Localization Project
نویسنده
چکیده
Pakistan has a population of 140 million speaking more than 56 different languages. Urdu is the lingua franca of these people, as many speak Urdu as a second language, also the national language of Pakistan. Being a developing population, Pakistani people need access to information. Most of the information over the ICT infrastructure is only available in English and only 5-10% of these people are familiar with English. Therefore, Government of Pakistan has embarked on a project which will generate software to automatically translate the information available in English to Urdu. The project will also be able to convert Urdu text to speech to extend this information to the illiterate population as well. This paper overviews the overall architecture of the project and provides briefs on the three components of this project, namely Urdu Lexicon, English to Urdu Machine Translation System and Urdu Text to Speech System.
منابع مشابه
Urdu and the Parallel Grammar Project
We report on the role of the Urdu grammar in the Parallel Grammar (ParGram) project (Butt et al., 1999; Butt et al., 2002).1 The ParGram project was designed to use a single grammar development platform and a unified methodology of grammar writing to develop large-scale grammars for typologically different languages. At the beginning of the project, three typologically similar European grammars...
متن کاملUrdu in a parallel grammar development environment
Abstract. In this paper, we report on the role of the Urdu grammar in the Parallel Grammar (ParGram) project (Butt et al., 1999; Butt et al., 2002). The Urdu grammar was able to take advantage of standards in analyses set by the original grammars in order to speed development. However, novel constructions, such as correlatives and extensive complex predicates, resulted in expansions of the anal...
متن کاملQualitative Analysis of Contemporary Urdu Machine Translation Systems
The diversity in source and target languages coupled with source language ambiguity makes Machine Translation (MT) an exceptionally hard problem. The highly information intensive corpus based MT leads the MT research field today, with Example Based MT and Statistical MT representing two dissimilar frameworks in the data-driven paradigm. Example Based MT is another approach that involves matchin...
متن کاملUrdu Correlatives: Theoretical and Implementational Issues
The inclusion of South Asian languages in multilingual grammar development projects that were initially based on European languages has resulted in a number of interesting extensions to those projects. Butt and King (2002) report on the inclusion of Urdu in the Parallel Grammar Project (ParGram; Butt et al. (1999, 2002)) with respect to case and complex predicates. In this paper, we focus on a ...
متن کاملHolistic Approach for Urdu Character Recognition Using Modified Hmm
Automatic recognition of cursive handwritten script remains a challenging problem even with the promising improvement in classifier and computational power. Segmentation based approach for recognition of handwritten Urdu script has considerable computational overhead and has lower accuracy as compared to Roman and Chinese script due to additional segmentation error. Presence of complimentary ch...
متن کامل